Data

Predictors: body mass interacting with habitat.

Read in data

mammal_hr <- read.csv("data/hr.csv") 

Notes

  • log10hr is the base-10 logarithm of the heart rate in beats/min

Cleaning

Rename some variables and create species name from genus and species.

mammal_hr <- mammal_hr %>%
  rename(source = Source,
         order = order.corrected,
        # n.animals = Number.of.Animals,
         mass.kg = body.mass..kg.) %>%
  mutate(order = str_to_title(order),
         genus = str_to_title(genus),
         animal = paste(genus, species),
         above.10kg = ifelse(mass.kg >= 10, 'Larger', 'Smaller'))

Keep only variables we will be using. And “factor” “chr” variables.

mammal_hr <- mammal_hr %>%
  dplyr::select(order,
                genus,
                species, 
                common.name,
                mass.kg,
                log10.body.mass,
                log10hr,
                habitat,
                source,
                animal,
                above.10kg
         ) %>%
  mutate(across(where(is.character), factor)) %>%
  arrange(order, genus, species)

Viz

A few quick graphs just to make sure the data are looking as we expect (error checking).

my_scatter_plot <- gf_point(log10hr ~ log10.body.mass | habitat,
         color = ~order,
         # size = ~parse_number(n_animals),
         data = mammal_hr,
         alpha = 0.5) %>%
  gf_theme(legend.position = 'bottom',
           legend.title = element_text(size = 8),
           legend.text = element_text(size = 6)) %>%
  gf_theme(scale_color_viridis_d('Order')) %>%
  gf_labs(x = 'Log10(fH (beats/min))',
          y = 'Log10(BMR (kcal/day))') 
my_scatter_plot

plotly::ggplotly(my_scatter_plot) %>%
  plotly::layout(legend = list(#orientation = 'h',
                               font = list(size = 6)))

Notes: If we wanted to use for web or publication, we would refine. We can edit what info is shown when you hover over a data point, change color scheme, legend, etc.

GLS

Will not account for phylogeny at all in the model structure. Predictors: body mass interacting with habitat. Weight by number of individuals (if have data later, don’t have it now)?

lm_model <- lm(log10hr ~ log10.body.mass * habitat,
                 data = mammal_hr)
summary(lm_model)
## 
## Call:
## lm(formula = log10hr ~ log10.body.mass * habitat, data = mammal_hr)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.35008 -0.09705 -0.00279  0.07544  0.46383 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  2.30719    0.04970  46.419   <2e-16 ***
## log10.body.mass             -0.19865    0.01931 -10.288   <2e-16 ***
## habitatland                  0.01562    0.05638   0.277    0.782    
## log10.body.mass:habitatland -0.01550    0.02415  -0.642    0.523    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1371 on 85 degrees of freedom
## Multiple R-squared:  0.8096, Adjusted R-squared:  0.8028 
## F-statistic: 120.4 on 3 and 85 DF,  p-value: < 2.2e-16
tab_model(lm_model) 
  log 10 hr
Predictors Estimates CI p
(Intercept) 2.31 2.21 – 2.41 <0.001
log10.body.mass -0.20 -0.24 – -0.16 <0.001
habitat [land] 0.02 -0.10 – 0.13 0.782
log10.body.mass * habitat
[land]
-0.02 -0.06 – 0.03 0.523
Observations 89
R2 / R2 adjusted 0.810 / 0.803
lm_anova_results <- car::Anova(lm_model)
pander(lm_anova_results)
Anova Table (Type II tests)
  Sum Sq Df F value Pr(>F)
log10.body.mass 6.081 1 323.3 1.043e-30
habitat 0.003741 1 0.1989 0.6567
log10.body.mass:habitat 0.007742 1 0.4116 0.5229
Residuals 1.599 85 NA NA

Model Assessment

lm_preds <- predict(lm_model, se.fit = TRUE)

mammal_hr <- mammal_hr  %>% mutate(lm_resids = resid(lm_model),
         lm_fitted = lm_preds$fit,
         lm_fit_lo = lm_preds$fit + 1.96*lm_preds$se.fit,
         lm_fit_hi = lm_preds$fit - 1.96*lm_preds$se.fit)
gf_point(lm_resids ~ lm_fitted, data = mammal_hr)

s245::gf_acf(~lm_model)

gf_dhistogram(~lm_resids, data = mammal_hr,
              bins = 17) %>%
  gf_fitdistr()

Model Predictions

Any predictors not shown in a plot are held constant at their mean or most common value.

gf_line(lm_fitted ~ log10.body.mass,
         color = ~habitat,
         data = mammal_hr) %>%
  gf_ribbon(lm_fit_lo + lm_fit_hi ~ log10.body.mass,
            color = ~habitat, fill = ~habitat) %>% 
  gf_theme(scale_color_manual(values = my_colors)) %>%
  gf_theme(scale_fill_manual(values = my_colors)) 

gf_point(log10hr ~ lm_fitted, data = mammal_hr,
         alpha = 0.1) %>%
  gf_labs(y = 'Observed log10(fH)',
          x = 'Model-predicted log10(fH)',
          title = 'Linear Regresssion (no phylogeny)') %>%
  gf_abline(slope = 1, intercept = 0, color = 'black', linetype = 'dashed')

Can refine the plot of predictions later on.

Mixed-effects model

Will include nested random effects of order/family/genus/species, expecting similarity of observations based on a hierarchy of phylogenetic relatedness, but not in a specified/structured way; only groupings are used, with no sense of (for example) the fact that two orders are thought to be “further” apart than any other two.

re_model <- glmmTMB(log10hr ~ log10.body.mass * habitat + (1 | order/genus),
                 data = mammal_hr)
summary(re_model)
##  Family: gaussian  ( identity )
## Formula:          log10hr ~ log10.body.mass * habitat + (1 | order/genus)
## Data: mammal_hr
## 
##      AIC      BIC   logLik deviance df.resid 
##    -92.5    -75.1     53.2   -106.5       82 
## 
## Random effects:
## 
## Conditional model:
##  Groups      Name        Variance  Std.Dev. 
##  genus:order (Intercept) 5.435e-03 7.372e-02
##  order       (Intercept) 2.571e-12 1.604e-06
##  Residual                1.271e-02 1.128e-01
## Number of obs: 89, groups:  genus:order, 69; order, 10
## 
## Dispersion estimate for gaussian family (sigma^2): 0.0127 
## 
## Conditional model:
##                             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                  2.30472    0.04946   46.59   <2e-16 ***
## log10.body.mass             -0.19728    0.01945  -10.14   <2e-16 ***
## habitatland                  0.01622    0.05535    0.29    0.769    
## log10.body.mass:habitatland -0.01504    0.02417   -0.62    0.534    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tab_model(re_model) 
  log 10 hr
Predictors Estimates CI p
(Intercept) 2.30 2.21 – 2.40 <0.001
log10.body.mass -0.20 -0.24 – -0.16 <0.001
habitat [land] 0.02 -0.09 – 0.12 0.769
log10.body.mass * habitat
[land]
-0.02 -0.06 – 0.03 0.534
Random Effects
σ2 0.01
τ00 genus:order 0.01
τ00 order 0.00
N genus 69
N order 10
Observations 89
Marginal R2 / Conditional R2 0.857 / NA
re_anova_results <- car::Anova(re_model)
pander(re_anova_results)
Analysis of Deviance Table (Type II Wald chisquare tests)
  Chisq Df Pr(>Chisq)
log10.body.mass 298 1 8.87e-67
habitat 0.1295 1 0.7189
log10.body.mass:habitat 0.3871 1 0.5338

Note: for this model the random effect of species is excluded as we don’t have enough species with more than one observation to fit it well.

Model Assessment

re_ave_preds <- predict(re_model, 
                    se.fit = TRUE,
                    re.form = ~0)
re_ind_preds <- predict(re_model,
                        se.fit = TRUE,
                        re.form = NULL)
mammal_hr <- mammal_hr %>%
  mutate(re_resids = resid(re_model),
         re_ind_fitted = re_ind_preds$fit,
         re_ave_fitted = re_ave_preds$fit,
         re_ave_lo = re_ave_preds$fit - 1.96*re_ave_preds$se.fit,
         re_ave_hi = re_ave_preds$fit + 1.96*re_ave_preds$se.fit)

# save fitted model and data
saveRDS(mammal_hr, 'fitted-models/hr-data.RDS')
saveRDS(re_model, 'fitted-models/hr-re-model.RDS')

gf_point(re_resids ~ re_ind_fitted, data = mammal_hr)

acf(resid(re_model))

#gf_acf(~re_model)
gf_dhistogram(~re_resids, data = mammal_hr, bins = 15) %>%
  gf_fitdistr()

Predictions from Model

gf_line(re_ave_fitted ~ log10.body.mass,
         color = ~habitat,
         data = mammal_hr) %>%
  gf_ribbon(re_ave_lo + re_ave_hi ~ log10.body.mass,
            color = ~habitat, fill = ~habitat) %>% 
  gf_theme(scale_color_manual(values = my_colors)) %>%
  gf_theme(scale_fill_manual(values = my_colors))

gf_point(log10hr ~ re_ind_fitted, data = mammal_hr,
         alpha = 0.1) %>%
  gf_labs(y = 'Observed log10(fH)',
          x = 'Model-predicted, Species-specific log10(fH)',
          title = 'Mixed-effects Model (RE of Order/Genus)') %>%
  gf_abline(slope = 1, intercept = 0, color = 'black', linetype = 'dashed')

PGLS

Read in Tree Data

#Read in the trees from Upham et al

tree_path <- paste("data/upham-trees/Completed_5911sp_topoCons_FBDasZhouEtAl")
tree_files <- list.files(tree_path)

all_trees <- list()

for (i in 1:length(tree_files)){
  all_trees[[i]] <- read.tree(paste0(tree_path, '/',
                                     tree_files[i]))
  if (i == 1){
    treeset <- all_trees[[i]]
  }else{
    treeset <- c(treeset, all_trees[[i]])
  }
}

all_tip_labels <- purrr::map(treeset, "tip.label")
all_tip_labels <- purrr::map(all_tip_labels,
                             function(x) 
                               stringr::str_replace_all(x, pattern = '_',
                                                      replacement = ' '))

# get list of species that are in ALL the trees
for (t in 1:length(all_tip_labels)){
  if (t == 1){
    tip_labs <- all_tip_labels[[t]]
  }else{
    tip_labs <- intersect(tip_labs, all_tip_labels[[t]])
  }
}

                            
# keep only the species in mammal_hr that are in all the trees
# on 4/14 this removes 1 species.
pgls_data <- mammal_hr %>%
  filter(animal %in% tip_labs) %>%
  droplevels()


taxonomy <- read_csv('data/upham-trees/taxonomy_mamPhy_5911species.csv')

Fit models, one model for every tree in our list.

pgls_models <- list()

for (t in c(1:length(treeset))){
  # make sure there is only one row of data per species (seems dubious??)
  pgls_rep_data <- pgls_data %>% 
    group_by(animal) %>%
    sample_n(1) %>%
    ungroup
  
  
  #Reduce the tree to only include those species in the data set
  refit_tree <- treeset[[t]]
  refit_tree$tip.label <- str_replace_all(refit_tree$tip.label, '_', ' ')
  refit_tree <- drop.tip(refit_tree, 
                         setdiff(refit_tree$tip.label, 
                                 levels(pgls_rep_data %>% pull(animal))))
  
  #Order the data set so that it is in the same order as the tip labels of the tree
  pgls_rep_data <- left_join(data.frame(tree.tip.label = refit_tree$tip.label),
                             pgls_rep_data,
                             by = c('tree.tip.label' = 'animal'),
                             keep = TRUE)
  
  # fit the model
  pgls_models[[t]] <- tryCatch({
    fittd <- gls(log10hr ~ log10.body.mass * habitat,
                 correlation = corPagel(value = 0.8, 
                                        phy = refit_tree, 
                                        fixed = FALSE, 
                                        form = ~animal),
                 data = pgls_rep_data)
  },
  error = function(cond){
    message(paste('PGLS fit failed for tree', t))
    return(NULL)
  }
  )
}

pgls_models <- pgls_models[!sapply(pgls_models, is.null)]

Note: we tried to fit 101 PGLS models (each with a different tree); of these, model fitting failed for 18.

Combine the many PGLS model runs together into one summary combined model according to Rubin’s rule.

# as.mira takes the list of models and create an object to be used by the mice package
pgls_mira <- as.mira(pgls_models)  
# # pool summarise the models using Rubin's rule corrected for small samples
pooled_pgls   <- pool(pgls_mira)
pooled_pgls_summ <- summary(pooled_pgls, type = 'all', conf.int = TRUE)

pander(pooled_pgls_summ %>% dplyr::select(term, estimate, std.error, `2.5 %`, `97.5 %`, lambda, fmi))
Table continues below
term estimate std.error 2.5 % 97.5 %
(Intercept) 2.286 0.06278 2.161 2.412
log10.body.mass -0.1992 0.01926 -0.2376 -0.1609
habitatland 0.01327 0.05719 -0.1006 0.1271
log10.body.mass:habitatland -0.01462 0.02445 -0.06329 0.03406
lambda fmi
0.1892 0.2137
0.009631 0.0337
0.01878 0.04285
0.01719 0.04126
pgls_anova <- Anova(pgls_mira)
pander(pgls_anova)
Analysis of Deviance Table (Type II tests) (continued below)
  F num df den df missing info
log10.body.mass 276.6 1 119400 0.02495
habitat 0.1857 1 43636 0.04129
log10.body.mass:habitat 0.3574 1 251385 0.01719
  riv Pr(>F)
log10.body.mass 0.02559 4.816e-62
habitat 0.04307 0.6665
log10.body.mass:habitat 0.01749 0.55

Alternative approach: using MuMIn to do model averaging. This will not necessarily weight all of the models/trees equally?

pgls_avg <- model.avg(pgls_models, 
                      rank = function(x) 1)

This object is one that we can better make predictions from. The other way (with pool()) is better for getting the ANOVA results.

For the PGLS model, we also want to extract the estimate of \(\Lambda\), which tells us about how the phylogeny is affecting the correlation structure.

Previous approach was to take the mean of the estimates of \(\Lambda\) from all the individual fitted models.

lambda <- mean(unlist(purrr::map(pgls_models, function(x) x$modelStruct$corStruct)))
lambda
## [1] 0.0271769

According to this simple method our estimate is \(\hat{\Lambda} =\) 0.027.

Model Assessment

It’s not really clear how to even approach this, since we don’t really any longer expect the residuals to be normal or independent. And based on the data we know there’s not a huge issue with linearity. So…ok?